Variable selection in model-based clustering using multilocus genotype data

نویسندگان

  • Wilson Toussile
  • Elisabeth Gassiat
چکیده

We propose a variable selection procedure in model-based clustering multilocus genotype data. Indeed, it may happen that some loci are not relevant for clustering into statistically different populations. Inferring the number K of clusters and the relevant clustering subset S of loci is regarded as a model selection problem. The competing models are compared using penalized maximum likelihood criteria. Under weak assumptions on the penalty function, we prove the consistency of the resulting estimator ( K̂n, Ŝn ) . An associated algorithm named Mixture Model for Genotype Data (MixMoGenD) was implemented using c + + programming language and is available on www.math.u-psud.fr/~toussile. To avoid an exhaustive research of the optimum model, we propose an adaptation of the Backward-Stepwise algorithm, which enables a better research of the optimum model among all possible cardinalities of S. We present numerical experiments on simulated and real datasets that highlight the interest of our loci selection procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

Multivariate Estimation of Rock Mass Characteristics Respect to Depth Using ANFIS Based Subtractive Clustering- Khorramabad- Polezal Freeway Tunnels

Combination of Adoptive Network based Fuzzy Inference System (ANFIS) and subtractive clustering (SC) has been used for estimation of deformation modulus (Em) and rock mass strength (UCSm) considering depth of measurement. To do this, learning of the ANFIS based subtractive clustering (ANFISBSC) was performed firstly on 125 measurements of 9 variables such as rock mass strength (UCSm), deformati...

متن کامل

A Simple Model-based Approach to Variable Selection in Classification and Clustering

Clustering and classification of replicated data is often performed using classical techniques that inappropriately treat the data as unreplicated, or by complex modern ones that are computationally demanding. In this paper we introduce a simple approach based on a ‘spike-and-slab’ mixture model that is fast, automatic, allows classification, clustering and variable selection in a single framew...

متن کامل

Inference of population structure using multilocus genotype data.

We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or...

متن کامل

Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds.

We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Adv. Data Analysis and Classification

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2009